Analysis of Microbiome Data in the Presence of Excess Zeros

نویسندگان

  • Abhishek Kaul
  • Siddhartha Mandal
  • Ori Davidov
  • Shyamal D. Peddada
چکیده

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Next Generation Sequencing and its Application in the Study of Microbiome in Plant Diseases Suppressive Soils

Progress in next-generation sequencing has played a significant role in ecological studies of microbial populations. These advances have led to a rapid evaluation in metagenomics studies (analysis of DNA of microbial communities without the need to culture). Many statistical and computational tools and metagenomics databases have led to the discovery of huge amounts of data. In this research, i...

متن کامل

Analysis of High Dimensional Compositional Data Containing Structural Zeros with Applications to Microbiome Data

This paper is motivated by the recent interest in the analysis of high dimensional microbiome data. A key feature of this data is the presence of ‘structural zeros’ which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are insufficient to model these structural zeros. We define a general ...

متن کامل

Hurdle, Inflated Poisson and Inflated Negative Binomial Regression Models ‎ for Analysis of Count Data with Extra Zeros

In this paper‎, ‎we ‎propose ‎Hurdle regression models for analysing count responses with extra zeros‎. A method of estimating maximum likelihood is used to estimate model parameters. The application of the proposed model is presented in insurance dataset‎. In this example‎, there are many numbers of claims equal to zero is considered that clarify the application of the model with a zero-inflat...

متن کامل

A Metagenomic Analysis of Lung Microbiome in Chemically Injured and Healthy Individuals

Background and Aim: The role of the lung microbiome in respiratory complications associated with chemicals such as sulfur mustard or chlorine gas has yet to be determined. The aim of this study was to compare the structure and composition of the lung microbiome in chemically injured and healthy individuals in order to understand the relation between the population of the lung microbiota and res...

متن کامل

Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data

Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017